Method, computer program and computing system for optimizing an architectural model of a microprocessor

ABSTRACT

A computer program for optimizing an architectural model of a microprocessor by configuring elements of the instruction set as nodes of the graph. The architectural model of a microprocessor represents an instruction set of the microprocessor. Determination is made whether nodes with identical bit position and value encoding are in the graph. If nodes with the identical bit position and value encoding are present, a path from a source node to a target node is separated into a common node for each node in the graph. The common node is reused to optimize common paths from the graph and the source node is directly connected to the common node in the graph using a forward edge. A back-edge is added from the common node to the source node through the target node and the above steps are recursively repeated until all nodes of the graph are processed.

This is a continuation-in-part of application of Ser. No. 12/805,286,filed Jul. 22, 2011. The present application claims priority based onIndian Patent Application No. 1769/CHE/2009, filed Jul. 27, 2009, theentirety of which being incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to the field of microprocessorarchitectural modeling and design. The present invention specificallyrelates to a method, computer program and computing system foroptimizing an architectural model of a microprocessor.

BACKGROUND OF THE INVENTION

Architecture Description Languages (ADL) are used to describe thearchitecture of a microprocessor. An ADL description of a processor istypically utilized for designing the processor, software/hardware basedverification of the processor behavior, generation of compilertool-chain for the processor and for creating a hardware description ofthe processor. An ADL description describes instructions, operands,addressing modes, functional units and registers of the microprocessor.An ADL description also captures the behavior of the processor, eitherembedded in the processor description itself or as external functions.

Typically, an ADL captures the information about the processor in ahierarchical fashion, where instruction-groups (also called as‘bundles’) are specified at the top-level of the hierarchy. Each bundlerefers to one or more instructions, and each instruction refers to oneor more operands and constant values. Similarly, each operand cancontain one or more constant value or refer to other operands. Using anADL, similar instructions, operands and bundles can be grouped togetherto create a more compact representation of the instruction set of theprocessor. Such compact representation of the instruction set is alsoless prone to errors because duplication of information is avoided.Also, various programming tools for the processor can be automaticallygenerated from an ADL description of the microprocessor.

Furthermore, in order to process the ADL description of themicroprocessor in a computer environment, it is necessary to parse thedescription and create an internal representation of the informationcontained in the description. The obvious way to represent thisinformation is a tree with instruction groups (bundles) at the root,instructions at the internal nodes and operands, and constant values atthe leaf of the tree. Then, a decoder for the instruction set can beconstructed to traverse the hierarchy starting from the root node andvisit the nodes, matching the corresponding bits at each node.

The problem with the above conventional hierarchical representation ofthe instruction set architecture is that it involves duplication ofnodes when the same operand or constant value is referenced by manyinstructions or instruction groups in the architecture. Moreover, whendecoding a machine instruction using the hierarchical description, thesame nodes may have to be compared many times with the input, whichresults in increased number of comparisons when implemented in software.Such large number of comparisons in software requires similar number ofcomparators to be implemented in hardware. Therefore, it is desirable toprovide a method, computer program and computing system for optimizingan architectural model of a microprocessor, which leads to efficientdecoding of the machine code for the microprocessor. Also, suchoptimized architectural model results in a hardware circuit that issmaller in size and consumes lesser power when the model is transformedinto hardware for the microprocessor.

OBJECT OF THE INVENTION

An object of the present invention is to provide a method, computerprogram and computing system for optimizing an architectural model of amicroprocessor, which achieves efficient decoding of the machine codefor the microprocessor.

An object of the present invention is to provide a method, computerprogram and computing system for optimizing an architectural model of amicroprocessor, which results in a hardware that is smaller in size andconsumes lesser power when the microprocessor is realized in hardware.

SUMMARY OF THE INVENTION

According to one aspect, the present invention, which achieves thisobjective, relates to a method, computer program and computing systemfor optimizing an architectural model of a microprocessor, comprising:representing an instruction set of the microprocessor as a graph byrepresenting the elements of the instruction set as nodes of the graph.Determination is made whether the nodes with identical bit position andvalue encoding is present in the graph. If the nodes with the identicalbit position and value encoding are present, a path from a source nodeto a target node is separated into a common node for each node in thegraph. The common node is reused to optimize common paths out of thegraph and the source node is directly connected to the common node inthe graph using a forward edge. A back-edge is added from the commonnode to the source node through the target node and the above steps arerecursively repeated until all the nodes of the graph are processed.Thus, the method, computer program and computing system reduce therequired number of comparisons due to the common path optimization,which results in efficient decoding of the machine code for themicroprocessor.

Furthermore, the elements of the instruction set are defined asinstruction groups, instructions and operands. The forward edge isconnected from the source node to the target node in the directed graphif there is a reference from the element (bundle, instruction oroperand) corresponding to the source node to the element correspondingto the target node in the instruction set. Such forward edge is labeledwith the bit position and value where the element corresponding to thetarget node is encoded in the element corresponding to the source node.Similarly, the back-edge is introduced from all target nodes to theircorresponding source nodes and labeled with a Boolean value (0 or 1).Initially, all back-edges are labeled with zero. The back-edge ending ina node is called as an input back-edge whereas the back-edge originatingfrom a node is called an output back-edge.

In addition, the node corresponding to the top-level element in theinstruction set contains no output back-edges and one or more inputback-edges. The node corresponding to a leaf element in the instructionset hierarchy contains no input back-edges and one or more outputback-edges, where other nodes contains one or more input and outputback-edges. The label on the output back-edge of the node is changedfrom 0 to 1 when all its input back-edges are labeled with 1. The nodescorresponding to the elements present at the leaf level in theinstruction set hierarchy are reused in the graph instead of creating anew node for each reference to them. Each node representing the leafelement includes a comparator to compare input bits with the valuesrepresented by the node. The output value of the comparator is set tothe back-edge corresponding to the bit position of the input. Similarly,each node corresponding to the non-leaf element of the instruction setlogically ANDs all its input back-edges and the result of the logicalAND is set as the output back-edges of the node.

When decoding the instruction of the processor, a machine code ofspecified number of bits is passed to the node corresponding to thetop-level element in the instruction set. Then, these bits arepropagated through the graph to all nodes connected to the top-levelnode, with each node obtaining a specified number of bits correspondingto the bit position where it is encoded. When a part of the machineinstruction is matched by a node corresponding to the leaf element inthe instruction set, it sets all its output back-edges to 1. The nodesat the target of these output back-edges receive the information andpropagate it through their back-edges resulting in a path being selectedfrom the nodes corresponding to the leaf elements to the top-levelelement, thereby decoding the input machine instruction. If no matchoccurs at any node corresponding to the leaf element of the instructionset or only a partial match occurs with the input machine code, then apath to the top-level node along the back-edges is not available, whichindicates that the input machine code is not a valid machine instructionfor the processor under consideration.

Moreover, the graph is optimized by separating each path passing from aparent node to a child node into a common node for each node in thegraph, with incoming edge from the parent node and outgoing edge to thechild node. Due to the common-path optimization and the presence ofback-edges, the number of comparisons required to match instructions isreduced, which results in efficient decoding of instructions. Thisoptimization also reduces the number of data paths and comparators whenthe architecture of the processor is realized in hardware, which resultsin a hardware circuit of a smaller size that consumes lesser power.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be discussed in greater detail with reference to theaccompanying Figures.

FIG. 1 shows a flow diagram of the steps of a method for optimizing aninstruction set architectural model of a microprocessor, in accordancewith an exemplary embodiment of the present invention;

FIG. 2 illustrates a flow diagram of the steps of constructing adirected graph of the elements of the instruction set of themicroprocessor, in accordance with an exemplary embodiment of thepresent invention;

FIG. 3 illustrates an exemplary hierarchical representation of theinstruction set of the microprocessor with a bundle and twoinstructions, in accordance with an exemplary embodiment of the presentinvention;

FIG. 4 illustrates an initial directed graph of the instruction set ofthe microprocessor as shown in FIG. 3, in accordance with an exemplaryembodiment of the present invention;

FIG. 5 illustrates a graph node corresponding to a leaf node of theinstruction set of the microprocessor, in accordance with an exemplaryembodiment of the present invention;

FIG. 6 illustrates a graph node corresponding to a non-leaf node of theinstruction set of the microprocessor, in accordance with an exemplaryembodiment of the present invention;

FIG. 7 illustrates the directed graph of the instruction set of FIG. 4after initial optimization is applied to an ADD_INSN node, in accordancewith an exemplary embodiment of the present invention;

FIG. 8 illustrates a final optimized version of the directed graph ofthe instruction set of FIG. 4, in accordance with an exemplaryembodiment of the present invention; and

FIG. 9 illustrates a state of the directed graph of the instruction setof FIG. 4 when decoding an ADD instruction of the microprocessor, inaccordance with an exemplary embodiment of the present invention;

FIG. 10 illustrates a schematic block diagram of a microprocessor designincorporating the optimized architecture model, in accordance with theexemplary embodiment of the present invention;

FIG. 11 illustrates a simplified block diagram of an exemplary hardwareimplementation of a computing system 1100, constructed and operative forsynthesizing a microprocessor design incorporating the optimizedarchitecture model, in accordance with an exemplary embodiment of thepresent invention.

DETAILED DESCRIPTION OF THE INVENTION

Referring to FIG. 1, a flow diagram of the steps of a method foroptimizing an instruction set architectural model of a microprocessor isillustrated, in accordance with an exemplary embodiment of the presentinvention. In step 100, the instruction set of the microprocessor isconverted into a directed graph ‘G’ according to the steps of FIG. 2,which illustrates a flow diagram of the steps of constructing a directedgraph of the elements of the instruction set of the microprocessor, inaccordance with an exemplary embodiment of the present invention. Instep 200, a node ‘N’ corresponding to the top-level element ‘E’ in theinstruction set hierarchy is created in the graph ‘G’.

In step 210, the element ‘E’ is set as the current element consideredfor processing. As shown in step 220, each leaf element ‘EL’, i.e. achild of the element ‘E’ is considered. If a node ‘NL’ in the graph ‘G’corresponding to the leaf element ‘EL’ is already exists, then a forwardedge is added from the node ‘N’ to the node ‘NL’. This forward edge islabeled with the bit position and value corresponding to the encoding ofleaf element ‘EL’ in the current element. Similarly, a back-edge isadded from the node ‘NL’ to the node ‘N’. If there is no node in thegraph ‘G’ corresponding to the leaf element ‘EL’, then a new node ‘NL’is created in the graph ‘G’. Moreover, a forward edge from the node ‘N’to the node ‘NL’ and a back-edge from the node ‘NL’ to the node ‘N’ areadded.

In step 230, each non-leaf element ‘ENL’ of the current element beingprocessed is considered. A new node, ‘NNL’ is created in the graph ‘G’corresponding to the non-leaf element ‘ENL’. A forward edge is added inthe graph ‘G’ from the node ‘N’ to the node ‘NNL’, where the forwardedge is labeled with the bit position and value corresponding to theencoding of non-leaf element ‘ENL’ in the current element. Similarly, aback-edge is added from the node ‘NNL’ to the node ‘N’. In step 240, allthe child elements ‘Ec’ of the current element are processed byrecursively repeating step 220. The complete graph of the instructionset is created when all the elements in the instruction set hierarchyare processed.

Returning to step 110 in FIG. 1, an unprocessed node ‘N’ of the directedgraph ‘G’ is considered. In step 120, a path ‘P’ from a parent node ‘Np’to another child node ‘Nc’ through the node ‘N’ is considered. In step130, checking is made whether a node ‘K’ with the same bit position andvalue encoding is already present in the graph ‘G’ or not. If it is notpresent, then a new node ‘K’ is created in the graph ‘G’. Thereafter, instep 140, the child node ‘Nc’ is replaced with the node ‘K’. A forwardedge is added from the node ‘Np’ to the node ‘K’, whereas the forwardedge from the node ‘NP’ to the node ‘N’ is removed. So, the node ‘NP’ isdirectly connected to the node ‘K’ in the forward direction and through‘N’ as part of the back-edges. In step 150, checking is made whether allnodes in the graph ‘G’ are processed without any new changes to thegraph ‘G’. If there are any changes in the graph ‘G’, then theprocessing can be repeated from step 110. Similarly, the processing canbe terminated when there are no more changes to the graph ‘G’ since theprevious iteration.

Referring to FIG. 3, an exemplary hierarchical representation of theinstruction set of the microprocessor with a bundle and two instructionsis illustrated, in accordance with an exemplary embodiment of thepresent invention. In this hierarchical representation, the instructionset of the microprocessor can be represented with the bundle and twoinstructions, namely ADD_INSN and SUB_INSN. Each instruction contains anoperation code (ADD_OPCODE and SUB_OPCODE) and two register operands(OP1, OP2). The operation codes ADD_OPCODE and SUB_OPCODE are encoded atidentical positions in the ADD and SUB instructions, respectively.

Similarly, the two operands OP1 and OP2 are encoded at identicalpositions in both the instructions. The operands OP1 and OP2 are of typeREG, which refers to the three registers of the processor, i.e. R1, R2and R3. The bit positions, where an element is encoded in the machineinstruction, are indicated by labeling an edge using a start bitposition, the number of bits and an optional value. For example, theedge from the bundle to the instruction ADD_INSN is labeled using(0,16), which indicates that the instruction ADD_INSN is encodedstarting at bit 0 for 16 bits in the instruction word. As anotherexample, the edge from the instruction ADD_INSN to the operation codeADD_OPCODE is labeled as (0,8,0), which indicates that the operationcode ADD_OPCODE with value 0 is encoded at bit 0 for 8 bits.

FIG. 4 illustrates an initial directed graph of the instruction set ofthe microprocessor as shown in FIG. 3, in accordance with an exemplaryembodiment of the present invention. In the initial directed graph, thenodes corresponding to OP1 and OP2 are connected by forward edges to asingle node corresponding to REG, and there is a back-edge from REG nodeto OP1 and REG to OP2 (the back-edges are represented with dottedlines). FIG. 5 illustrates a graph node corresponding to a leaf node ofthe instruction set of the microprocessor, in accordance with anexemplary embodiment of the present invention. Here, this node containsa comparator that matches input bits with the values that the node cancarry. The comparator outputs ‘1’ if the input bits match any of thevalues and outputs ‘zero’ if the input bits are not matched with any ofthe values of the node. The output of the comparator is set to theback-edges of the node in such a way that each back-edge is set if thecorresponding bits in the input match with the node's value.

Referring to FIG. 6, a graph node corresponding to a non-leaf node ofthe instruction set of the microprocessor is illustrated, in accordancewith an exemplary embodiment of the present invention. Here, the valuesof the input back-edges are logically ANDed together and the result isset to the output back-edges. FIG. 7 illustrates the directed graph ofthe instruction set of FIG. 4 after initial optimization is applied toan ADD_INSN node, in accordance with an exemplary embodiment of thepresent invention. Here, the forward edges from the instruction ADD_INSNto the operation code ADD_OPCODE and the operands OP1 and OP2 areremoved and replaced by a direct edge from the BUNDLE node. Therefore,for each node ‘N’ in the graph, each path traveling from a parent-node‘Np’ to a child node ‘Nc’ is separated into a common node, with incomingedge from the node ‘Np’ and out-going edge to the node ‘N’. Thus, thecommon paths are optimized out of the graph by reusing common nodes.

FIG. 8 illustrates a final optimized version of the directed graph ofthe instruction set of FIG. 4, in accordance with an exemplaryembodiment of the present invention. The graph is optimized by replacingthe BUNDLE->OP1->REG path with a direct edge from BUNDLE to the registerREG. Similarly, the BUNDLE->OP2->REG path is replaced with a direct edgefrom BUNDLE to the register REG. While processing theBUNDLE->ADD_INSN->OP1->REG path or the BUNDLE->ADD_INSN->OP2->REG path,checking is made if an identical path already exists. Since BUNDLE->REGpath exists, the BUNDLE->ADD_INSN->OP1->REG path or theBUNDLE->ADD_INSN->OP2->REG path is removed and a back-edge is added fromthe operands OP1 and OP2 to the instruction ADD_INSN. Likewise, whileprocessing the BUNDLE->SUB_INSN->OP1->REG path or theBUNDLE->SUB_INSN->OP2->REG path, checking is made if an identical pathalready exists. Since BUNDLE->REG path exists, theBUNDLE->SUB_INSN->OP1->REG path or the BUNDLE->SUB_INSN->OP2->REG pathis removed and a back-edge is added from the operands OP1 and OP2 to theinstruction SUB_INSN.

FIG. 9 illustrates a state of the directed graph of the instruction setof FIG. 4 when decoding an ADD instruction of the microprocessor, inaccordance with an exemplary embodiment of the present invention. Theback-edges from a node, at which a part of the machine instructionmatches, are shown in bold. Let us consider the instruction is ADD R1,R3. Note that the operands OP1 and OP2 matching with the respectiveregisters R1 and R3 are propagated to both ADD and SUB instructions.However, since only the operation code OPCODE of the instruction ADDmatches with the given instruction, only the back-edge from theinstruction ADD_INSN to the bundle is enabled, which indicates that theinput instruction is an ADD instruction.

Furthermore, the 16 bits corresponding to this instruction are passed tothe BUNDLE node. Then, the bits corresponding to the operation codeOPCODE of the instruction (i.e., bits 0 to 7) are passed from the BUNDLEnode to the OPCODE node, which is matched with the opcode values in thenode. Since this opcode values matches with operation code OPCODE of theADD instruction (i.e., zero), the back-edge from the node to theinstruction ADD_INSN is set to 1. Thereafter, the bits corresponding tothe first register (i.e., bits 8 to 11) are passed from the BUNDLE nodeto the REG node. The REG node compares the input value with the possiblevalues that it may carry, which matches with the register R1 and thus,the back-edge from the register REG to the operand OP1 is set to 1. Thisin turn causes the back-edge from the operand OP1 to the instructionADD_INSN and the operand OP1 to the instruction SUB_INSN to be set to 1.

Similarly, the bits corresponding to the second register (i.e., bits 12to 15) are then compared with the REG node, which matches with the valuefor R3 and thus, the back-edge from the register REG to the operand OP2is set to 1. This causes the back-edges from the operand OP2 to theinstruction ADD_INSN and the operand OP2 to the instruction SUB_INSN tobe set to 1. In this example, all the input back-edges of theinstruction ADD_INSN are set to 1. This causes the output back-edge fromthe instruction ADD_INSN to BUNDLE to be set to 1, which indicates thatthe instruction ADD_INSN is matched. Even though, the input back-edgesfrom the operands OP1 and OP2 to the instruction SUB_INSN are set to 1,since the back-edge from the operation code OPCODE is set to zero, theoutput back-edge from the instruction SUB_INSN to BUNDLE is set to zero,which indicates that the instruction SUB_INSN is not matched with theinput machine code.

FIG. 10 illustrates a schematic block diagram of a microprocessor designincorporating the optimized architecture model, in accordance with theexemplary embodiment of the present invention. The microprocessor 1000is composed of a processor core 1002, an on-chip memory 1004 (includingprogram memory), and an external interface 1006. The microprocessor 1000is fabricated using the customized design obtained using the computerprogram of the present invention, which is subsequently synthesized intoa logic level representation, and then reduced to a physical deviceusing compilation, layout and fabrication techniques well known in thesemiconductor arts. In particular, the method as previously describedherein and illustrated in FIGS. 1-9 can readily be implemented into themicroprocessor 1000.

The processor core 1002 is embedded with an instruction fetch 1022, aninstruction decoder 1024, and a control logic unit 1026 associatedtherewith. The instruction fetch 1022 is adapted to perform retrievaloperation of the instruction set of the method previously describing theoptimization of the architectural model of the microprocessor, from thememory 1004, where the instruction set of the previously describedmethod is stored in the memory 1004 accessible by the processor core1002. The retrieved instruction set of the previously described methodis traversed by the instruction decoder 1024, so that the control logicunit 1026 is operative to facilitate execution of the instruction set ofthe previously described method.

It will be appreciated by one skilled in the art that the microprocessor1000 may contain any commonly available peripheral such as serialcommunications devices, parallel ports, timers, counters, high currentdrivers, analog to digital (A/D) converters, digital to analogconverters (D/A), interrupt processors, LCD drivers, memories and othersimilar devices. Further, the processor 1000 may also include custom orapplication specific circuitry.

FIG. 11 illustrates a simplified block diagram of an exemplary hardwareimplementation of a computing system 1100, constructed and operative forsynthesizing a microprocessor design incorporating the optimizedarchitecture model, in accordance with an exemplary embodiment of thepresent invention. The computing system 1100 capable of synthesizing themicroprocessor design using the methodology of microprocessor'sarchitecture model optimization discussed with respect to FIGS. 1-9, isdescribed herein. The computing device 1100 comprises a motherboard 1102having a central processing unit (CPU) 1104, random access memory (RAM)1106, and a memory controller 1108. A storage device 1110 (such as ahard disk drive or CD-ROM), input device 1112 (such as a keyboard ormouse), and display device 1114 (such as a CRT, plasma, or TFT display),as well as buses necessary to support the operation of the host andperipheral components, are also provided. The aforementioneddescriptions and synthesis design for implementation of the method foroptimizing the architectural model of the microprocessor 1000, arestored in the form of an object code representation of a computerprogram in the RAM 1106 and/or storage device 1110 for use by the CPU1104 during design synthesis, the latter being well known in thecomputing arts. The processing device 1104 is in communication with thecomputer memory storage device 1110, and is configured for designimplementation of the method for optimizing the architectural model ofthe microprocessor 1000.

The flowchart and block diagrams in the FIGS. 1-11 illustrate thearchitecture, functionality, and operation of possible implementationsof methods and systems according to various embodiments of the presentinvention. In this regard, each block in the flowchart or block diagramsmay represent a module, segment, or portion of code, which comprises oneor more executable instructions for implementing the specified logicalfunction(s). It should also be noted that, in some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of blockdiagrams and/or flowchart illustration, and combinations of blocks inthe block diagrams and/or flowchart illustration, can be implemented byspecial purpose hardware-based systems that perform the specifiedfunctions or acts, or combinations of special purpose hardware andcomputer instructions.

The present invention is not limited to the type, number or complexityof peripherals and other circuitry that may be combined using thecomputer program, method and computing system.

We claim:
 1. A tangible non-transitory computer system for optimizing anarchitectural model of a microprocessor, the computer system comprising:a tangible non-transitory computer-readable medium, a computer programbeing embodied in said computer-readable medium, wherein said computerprogram configures the computer system to execute the steps of:representing an instruction set of said microprocessor as a graph byconfiguring elements of said instruction set as a plurality of nodes ofsaid graph; determining whether said plurality of nodes with identicalbit position and value encoding is present in said graph; when saidplurality of nodes with the identical bit position and value encoding ispresent, separating a path from a source node to a target node into acommon node for each node in said graph and reusing the common node tooptimize common paths out of said graph; directly connecting said sourcenode to the common node in said graph using a forward edge; adding aback-edge from the common node to said source node through said targetnode and recursively repeating the above steps until all the nodes ofsaid graph are processed; and creating a new node in said graph whensaid plurality of nodes with the identical bit position and valueencoding is not present in said graph.
 2. The computer system as claimedin claim 1, wherein said forward edge is labeled with bit position andvalue encoding.
 3. The computer system as claimed in claim 2, whereinsaid forward edge is added from a source node to a target node in saidgraph if the elements corresponding to said source node refers to theelements corresponding to said target node.
 4. The computer system asclaimed in claim 1, wherein said back-edge is labeled with a Booleanvalue.
 5. The computer system as claimed in claim 1, wherein theelements of said instruction set are instruction groups, instructionsand operands.
 6. The computer system as claimed in claim 1, wherein theelements represented by said target node is encoded in the elementsrepresented by said source node.
 7. The computer system as claimed inclaim 1, wherein the elements of said instruction set are defined as aleaf element and non-leaf element.
 8. The computer system as claimed inclaim 7, wherein said plurality of nodes corresponding to the leafelement contains a comparator that matches input bits with values ofsaid plurality of nodes to set the output to said back-edge.
 9. Thecomputer system as claimed in claim 7, wherein said plurality of nodescorresponding to the non-leaf element logically ANDs the values of saidback-edge.